MapReduce任务:java.lang.OutOfMemoryError:GC overhead limit exceeded

#MapReduce任务:java.lang.OutOfMemoryError:GC overhead limit exceeded

缺省排序内存为100M,合并系数10.物理内存4G。
crawldb几千万条数据,crawldb大小5G,hostdb大小3G,topn=2百万,Generator无问题,NewGenerator发生以下问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
2016-08-09 21:10:07,211 WARN mapred.LocalJobRunner? (LocalJobRunner?.java:run(560)) - job_local179611962_0002
java.lang.Exception: java.lang.OutOfMemoryError?: GC overhead limit exceeded
at org.apache.hadoop.mapred.LocalJobRunner?$Job.runTasks(LocalJobRunner?.java:462)
at org.apache.hadoop.mapred.LocalJobRunner?$Job.run(LocalJobRunner?.java:529)
Caused by: java.lang.OutOfMemoryError?: GC overhead limit exceeded
at java.util.HashMap?.createEntry(HashMap?.java:897)
at java.util.HashMap?.addEntry(HashMap?.java:884)
at java.util.HashMap?.put(HashMap?.java:505)
at org.apache.hadoop.io.MapWritable?.readFields(MapWritable?.java:191)
at org.apache.nutch.crawl.CrawlDatum?.readFields(CrawlDatum?.java:317)
at org.apache.hadoop.io.serializer.WritableSerialization?$WritableDeserializer?.deserialize(WritableSerialization?.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization?$WritableDeserializer?.deserialize(WritableSerialization?.java:42)
at org.apache.hadoop.mapred.Task$ValuesIterator?.readNextValue(Task.java:1421)
at org.apache.hadoop.mapred.Task$ValuesIterator?.next(Task.java:1361)
at org.apache.hadoop.mapred.ReduceTask?$ReduceValuesIterator?.moveToNext(ReduceTask?.java:220)
at org.apache.hadoop.mapred.ReduceTask?$ReduceValuesIterator?.next(ReduceTask?.java:216)
at org.apache.nutch.crawl.NewGenerator?$Finder.reduce(NewGenerator?.java:219)
at org.apache.nutch.crawl.NewGenerator?$Finder.reduce(NewGenerator?.java:1)
at org.apache.hadoop.mapred.ReduceTask?.runOldReducer(ReduceTask?.java:444)
at org.apache.hadoop.mapred.ReduceTask?.run(ReduceTask?.java:392)
at org.apache.hadoop.mapred.LocalJobRunner?$Job$ReduceTaskRunnable?.run(LocalJobRunner?.java:319)
at java.util.concurrent.Executors$RunnableAdapter?.call(Executors.java:471)
at java.util.concurrent.FutureTask?.run(FutureTask?.java:262)
at java.util.concurrent.ThreadPoolExecutor?.runWorker(ThreadPoolExecutor?.java:1145)
at java.util.concurrent.ThreadPoolExecutor?$Worker.run(ThreadPoolExecutor?.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-08-09 21:10:08,488 ERROR crawl.NewGenerator? (NewGenerator?.java:run(992)) - Generator: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient?.runJob(JobClient?.java:836)
at org.apache.nutch.crawl.NewGenerator?.generate(NewGenerator?.java:764)
at org.apache.nutch.crawl.NewGenerator?.run(NewGenerator?.java:987)
at org.apache.hadoop.util.ToolRunner?.run(ToolRunner?.java:70)
at org.apache.nutch.crawl.NewGenerator?.main(NewGenerator?.java:941)